[ADD] add `transformer_int4_fp16_loadlowbit_gpu_win` api #11511

ACupofAir · 2024-07-05T02:36:27Z

Description

Add transformer_int4_fp16_loadlowbit_gpu_win api for igpu-perf use

Oscilloscope98

Please update the config here: https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/all-in-one/README.md#config; and update the description here: https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/all-in-one/README.md#optional-save-model-in-low-bit

Oscilloscope98 · 2024-07-05T02:53:40Z

Let's also add 'int4_fp16_loadlowbit_gpu' here: https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/all-in-one/run.py#L173

python/llm/dev/benchmark/all-in-one/run.py

Oscilloscope98 · 2024-07-05T07:22:21Z

python/llm/dev/benchmark/all-in-one/run.py

+        model = AutoModelForCausalLM.load_low_bit(model_path+'-'+low_bit, optimize_model=True, trust_remote_code=True,
+                                                  torch_dtype=torch.float16, use_cache=True, cpu_embedding=cpu_embedding).eval()
+        tokenizer = AutoTokenizer.from_pretrained(model_path+'-'+low_bit, trust_remote_code=True)
+        model = model.to('xpu')


Let's use model = model.half().to('xpu'), and remove torch_dtype=torch.float16 for run_transformer_int4_fp16_loadlowbit_gpu_win for now.

Due to the bug here: https://github.com/analytics-zoo/nano/issues/1489

…tics#11511) * [ADD] add transformer_int4_fp16_loadlowbit_gpu_win api * [UPDATE] add int4_fp16_lowbit config and description * [FIX] fix run.py mistake * [FIX] fix run.py mistake * [FIX] fix indent; change dtype=float16 to model.half()

[ADD] add transformer_int4_fp16_loadlowbit_gpu_win api

afd0f23

Oscilloscope98 reviewed Jul 5, 2024

View reviewed changes

python/llm/dev/benchmark/all-in-one/run.py Outdated Show resolved Hide resolved

ACupofAir added 3 commits July 5, 2024 10:58

[UPDATE] add int4_fp16_lowbit config and description

266262a

[FIX] fix run.py mistake

1797ecb

[FIX] fix run.py mistake

81f3d6c

Oscilloscope98 reviewed Jul 5, 2024

View reviewed changes

[FIX] fix indent; change dtype=float16 to model.half()

6d57e8f

Oscilloscope98 changed the title ~~[ADD] add transformer_int4_fp16_loadlowbit_gpu_win api~~ [ADD] add transformer_int4_fp16_loadlowbit_gpu_win api Jul 5, 2024

Oscilloscope98 approved these changes Jul 5, 2024

View reviewed changes

Oscilloscope98 merged commit 1efb6eb into intel-analytics:main Jul 5, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADD] add `transformer_int4_fp16_loadlowbit_gpu_win` api #11511

[ADD] add `transformer_int4_fp16_loadlowbit_gpu_win` api #11511

ACupofAir commented Jul 5, 2024

Oscilloscope98 left a comment

Oscilloscope98 commented Jul 5, 2024

Oscilloscope98 Jul 5, 2024

Oscilloscope98 Jul 5, 2024

[ADD] add transformer_int4_fp16_loadlowbit_gpu_win api #11511

[ADD] add transformer_int4_fp16_loadlowbit_gpu_win api #11511

Conversation

ACupofAir commented Jul 5, 2024

Description

Oscilloscope98 left a comment

Choose a reason for hiding this comment

Oscilloscope98 commented Jul 5, 2024

Oscilloscope98 Jul 5, 2024

Choose a reason for hiding this comment

Oscilloscope98 Jul 5, 2024

Choose a reason for hiding this comment

[ADD] add `transformer_int4_fp16_loadlowbit_gpu_win` api #11511

[ADD] add `transformer_int4_fp16_loadlowbit_gpu_win` api #11511